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Steady-state mRNA levels are tightly regulated through a combination of transcriptional and 
post-transcriptional control mechanisms. The discovery of c/s-acting DNA elements that 
encode these control mechanisms is of high importance. We have investigated the influ- 
ence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient 
whole genome duplication event, on the breadth of gene expression and the rates of mRNA 
decay in Arabidopsis thaliana. The absence of CNSs near a duplicate genes was associ- 
ated with a decrease in breadth of gene expression and slower mRNA decay rates while 
the presence CNSs near a duplicates was associated with an increase in breadth of gene 
expression and faster mRNA decay rates. The observed difference in mRNA decay rate was 
fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through 
an unknown mechanism. This study supports the notion that some Arabidopsis CNSs reg- 
ulate the steady-state mRNA levels through post-transcriptional control mechanisms and 
that CNSs also play a role in controlling the breadth of gene expression. 
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INTRODUCTION 

Duplication of genetic material has been proposed to be one of 
the primary evolutionary factors driving organism complexity and 
occurs at various scales ranging from single gene transpositions to 
whole genome duplication (WGD) events (Freeling and Thomas, 
2006; Edger and Pires, 2009; Freeling, 2009; Schnable et al, 2011; 
Woodhouse et al., 201 1 ). Instances of WGD are particularly preva- 
lent in plants as roughly 35% of flowering plants are polyploid 
relative to their basal genera, and nearly all angiosperms have 
experienced an ancestral WGD (Semon and Wolfe, 2007; Wood 
et al, 2009; Paterson et al, 2010; Jiao et al, 201 1). Duplicate gene 
pairs that are retained post-duplication are expected to have either 
developed novel function (neofunctionalization) or distributed 
function between duplicated gene pairs (subfunctionalization) 
(Ohno, 1970; Force et al., 1999). The most likely outcome from a 
duplication event is the loss of additional genetic material through 
pseudogenization or deletion (fractionation) (Haldane, 1933; Nei 
and Roychoudhury, 1973; Freeling et al., 2012). However, many 
duplicated genes are enriched for particular biological functions 
(e.g., transcription factors, kinases, stress response), which sug- 
gests a more complex mechanism for gene retention (Blanc and 
Wolfe, 2004; Seoighe and Gehring, 2004; Zou et al, 2009). 

The retention of specific functional classes encoded in dupli- 
cated genes suggests the fractionation process may involve a com- 
bination of factors including environmental cues, gene duplication 
scale (e.g., single gene transposition vs. WGD) , and relative levels of 
gene expression (Birchler et al., 2005; Zou et al., 2009; Wang et al, 
2011; Yang and Gaut, 2011). For instance, genes retained after a 
WGD event are thought to be retained more frequently relative to 
discrete duplication events as WGD events would copy all flank- 
ing DNA that encodes contains regulatory information (Schnable 



et al., 20 1 1 ; Wang et al. , 20 1 1 ) . Genes retained from WGD events in 
Arabidopsis and Oryza are consistent with this hypothesis, as they 
are less likely to display divergent expression patterns than dupli- 
cated genes from small-scale events (Casneuf et al., 2006; Wang 
et al., 2011). Through the study of conserved non-coding DNA 
sequence flanking duplicated loci (CNS elements), it is possible 
to identify specific regulatory motifs copied and retained after the 
duplication event. 

Arabidopsis thaliana provides an excellent system to interpret 
the consequences of massive-scale gene duplication, as there have 
been three WGD events (Bowers et al, 2003; Maere et al., 2005; 
Barker et al., 2009). The most recent WGD in the Arabidopsis 
lineage was an ancient tetraploidy event that occurred roughly 
23.2 Mya [a duplication event; (Bowers et al., 2003; Maere et al, 
2005; Jiao et al., 20 1 1 ) ] . Remnants of the ot event can be detected in 
the form of duplicate gene pairs (a duplicates) and CNS elements 
that have resisted fractionation (Thomas et al, 2007). Briefly, a 
duplicate CNS elements between 15 and 285 bp in length were 
discovered as local alignment high-scoring segment pairs between 
two a duplicate homeologs that did not overlap protein coding or 
transposon DNA. 

The discovery of function encoded in CNS elements is an active 
area of research, as their discovery in Arabidopsis occurred within 
the last decade (Thomas et al., 2007). Recently, we identified a 
link between conserved non-coding sequences (CNSs) and the 
regulation of expression intensity, maintenance of co-expression 
between duplicate gene pairs, and association with known gene 
regulatory networks (Spangler et al, 2012a,b). Roughly half of 
the annotated CNSs contain known transcription factor binding 
sites (TFBS), although not all of the TFBS are functional (Freeling 
et al., 2007; Spangler et al., 2012a,b). We hypothesized that some 
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intronic CNSs could be encoding intron-mediated enhancement 
(IME) regulatory mechanisms (Spangler et al., 2012b). Moreover, 
it was previously shown that CNSs were not related to small RNAs 
or transposable elements (Thomas et al, 2007). The contribution 
of CNSs to the regulation of gene expression is clear, but knowledge 
of the specific underlying regulatory mechanisms is incomplete. 

While much focus on the regulation of mRNA levels has been 
at the transcriptional level, an increasing number of studies have 
focused on post-transcriptional control of steady-state mRNA lev- 
els (Shalem et al, 2008; Elkon et al, 2010; Vogel et al, 2010). The 
rates of mRNA degradation have been found to respond to various 
environmental and stress conditions, such as DNA damage, oxida- 
tive stress, and chemical exposure (Shalem et al., 2008; Elkon et al., 

20 10) . Biological function also appears correlated with mRNA sta- 
bility. Genes involved in metabolism tend to have longer half-lives, 
while regulatory genes tend to have shorter half-lives (Wang et al., 
2002; Yang et al, 2003). Narsai et al. (2007) calculated the rates 
of decay for over 13,000 Arabidopsis genes and found the median 
half-life to be 3.8 h. While Narsai et al. focused on identifying 
DNA sequence elements in the 5'- and 3'- UTRs associated with 
mRNA decay rates, their analyses did not include gene duplica- 
tion status or the presence of CNSs. Given the association of CNS 
position near a duplicates on predicted free folding energies of 
5'-UTRs (Spangler et al, 20 12b), we investigated any role of CNSs 
on mRNA stability. 

The focus of this study was to examine potential post- 
transcriptional control of gene expression encoded in CNSs 
located near a duplicate gene coding sequences. We hypothesized 
that regulatory motifs encoded in some CNS elements control the 
steady-state mRNA levels in Arabidopsis at the level of RNA sta- 
bility. We tested this hypothesis using the RNA decay information 
from Narsai et al, the most recent CNS annotation in Arabidopsis, 
and a collection of 7,158 publicly available microarray expression 
profiling datasets. We examined the effect of CNS gene position 
on the rate of mRNA decay and breadth of gene expression. 

RESULTS 

GENE CHARACTERISTICS AND mRNA DECAY RATE 

Whole genome duplicate gene pairs derived from the a duplica- 
tion event (a duplicates) exhibit higher average levels of expression 
than other genes in Arabidopsis (Wang et al., 201 1; Yang and Gaut, 

2011) . We had previously associated CNSs with changes in average 
expression intensity (AEI) and hypothesized that CNSs may influ- 
ence mRNA stability (Spangler et al., 2012b). In a simple system, 
the steady-state mRNA concentration can be considered a combi- 
nation of the rate of transcription and the rate of mRNA decay. We 
decided to test if the presence of CNSs was associated with changes 
in mRNA decay rates. To do this we collected the mRNA half-lives 
of 12,189 Arabidopsis genes from (Narsai et al, 2007). Within the 
12,189 genes from Narsai et al. there was a significant correlation 
between AEI and mRNA half-life across 7,016 processed microar- 
ray datasets (Spearman's rho = 0.462; p < 2.2 x 10~ 16 ; Figure 1), 
supporting the idea that AEI could be partially explained by the 
rate of mRNA decay. 

Conserved non-coding sequences have been identified in 
all subgene positions relative to a duplicates [5'-upstream, 5'- 
UTR, intron, 3'-UTR, and 3'-downstream (Thomas et al., 2007; 



Spangler etal, 2012b)]. While only ~34% of CNSs are located 
within transcribed subgene positions (5'-UTR, intron, and 3'- 
UTR), each of these regions have been associated with changes 
in mRNA stability independent of CNS annotation (Decker and 
Parker, 1993; Peng et al, 1998; Lindquist et al, 2004; Meng et al, 
2005; Wang et al, 2005; Narsai et al., 2007). For example, Narsai et 
al. identified that the absence of an intron was sufficient to decrease 
mRNA half-life (Narsai et al, 2007) and this pattern was main- 
tained with updated Arabidopsis annotation [TAIR10; Figure 2A; 
Kolmogorov-Smirnov p-value (KS-p) test p = <2.20 x 10~ 16 ]. 
Notably, the absence of an annotated 5'-UTR or 3'-UTR was 
also sufficient to decrease mRNA stability (Figures 2B,C; KS- 
p=<2.20x 10~ 16 and <2.20 x 10~ 16 , respectively). With the 
objective of identifying changes in mRNA that could be attrib- 
uted to CNS presence, we therefore limited our analyses to the 
9,958 genes measured by Narsai et al. that contained annotated 5'- 
UTR, intron, and 3'-UTR sequences. The list of 9,958 genes was 
separated into the three categories based on gene duplication sta- 
tus: a duplicates, singletons, and non-a duplicates. We considered 
a p-value < 0.001 significant for all comparisons. 

CNS PRESENCE AND mRNA DECAY RATE 

In order to examine if CNSs alter the rate of mRNA decay we sep- 
arated a duplicates into two gene subsets based on CNS presence. 
We found CNS negative a duplicates (a duplicates with no CNSs) 
had an increased mRNA half-life relative to all genes (median 5.02 
and 4.1 1 h, respectively; KS-p = 1.06 x 10~ 9 ; Figure 3A; Table 1). 
Notably, CNS positive a duplicates (a duplicates with at least one 
CNS) had a decreased mRNA half-life relative to all genes (median 
3.57 and 4.11 h, respectively; KS-p = 7.81x 10~ 7 ; Figure 3A; 
Table 1). The difference in mRNA half-life between CNS posi- 
tive a duplicates and CNS negative a duplicates was also signifi- 
cant (median 3.57 and 5.02 h, respectively; KS-p = 5.33 x 10~ 15 ; 
Table 1; Table SI in Supplementary Material). 

As AEI can vary based on CNS subgene position, we looked 
for a similar effect on the rate of mRNA decay by examining 
the half-lives of a duplicates with only non-transcribed CNSs, 
a duplicates with only transcribed CNSs, and a duplicates with 
both non-transcribed and transcribed CNSs. There was no differ- 
ence in mRNA half-life for a duplicates with only non-transcribed 
CNSs relative to all genes (median 3.51 and 4.11 h, respectively; 
KS-p = 1.03 x 10" 3 ; Figure 3B; Table 1). The mRNA half-life for a 
duplicates with only non-transcribed CNSs was significantly lower 
relative to CNS negative a duplicates (3.51 and 5.02 h, respec- 
tively; KS-p = 2.92 x 10~ 8 ; Table 1; Table SI in Supplementary 
Material). No significant change was observed in mRNA half- 
life between a duplicates with only transcribed CNSs relative 
to all genes (median 4.24 and 4.11 h, respectively; KS-p = 0.72; 
Figure 3B; Table 1), although the mRNA half-life for a duplicates 
with only transcribed was lower than CNS negative a dupli- 
cates (median 4.24 and 5.02 h, respectively; KS-p = 8.74 x 10~ 4 ; 
Table 1; Table SI in Supplementary Material). Interestingly, there 
was a significant decrease in mRNA half-life for a duplicates 
with both non-transcribed and transcribed CNSs compared to all 
genes (median 2.85 and 4. llh, respectively; KS-p = 6.99 x 10~ u ; 
Figure 3B; Table 1) and this decrease in mRNA half-life for a 
duplicates with both non-transcribed and transcribed CNSs was 
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FIGURE 1 | Comparison of mRNA half-life vs. average expression intensity across genome 



also lower than CNS negative a duplicates (2.85 and 5.02 h, respec- 
tively; KS-p < 2.20 x 10~ 16 ; Table 1; Table SI in Supplementary 
Material). All pairwise comparisons for mRNA half-life were also 
made using Wilcoxon ranked sum tests and resulted in similar 
patterns of significance (Table S2 in Supplementary Material). 

These results associate CNS annotation with an increase (CNS 
positive a duplicates) or decrease (CNS negative a duplicates) in 
rate of mRNA decay relative to genomic background. In order to 
verify this trend using a reverse approach we isolated the genes 
with the fastest rates of mRNA decay (lower quartile; <2.23h) 
and genes with the slowest rates of mRNA decay (upper quar- 
tile; >7.48h) and looked for enrichment or depletion of CNS 
annotation (Figure 4). Genes with the fastest rates of mRNA 
decay were enriched in CNS positive a duplicates relative to the 
genomic background (20.0 vs. 16.3%, respectively; Fisher's p- 
value (FI-p) = 2.05 x 10~ 5 ). Notably, genes with the fastest rates 
of mRNA decay were also depleted in CNS negative a duplicates 
relative to the genomic background (8.6 vs. 11.4%, respectively; 
FI-p = 7.12 x 10~ 5 ). Genes with the slowest rates of mRNA decay 
were enriched in CNS negative a duplicates relative to background 
(14.3 vs. 11.4%, respectively; FI-p = 8.08 x 10~ 5 ). Genes with the 
slowest rates of mRNA decay had no change in the proportion of 



CNS positive a duplicates relative to background (15.1 vs. 16.3%, 
respectively; FI-p = 0.14). 

CNS PRESENCE AND BREADTH OF GENE EXPRESSION 

As mentioned previously, a simple model of steady-state mRNA 
levels (e.g., AEI) could be explained by the combination of tran- 
scriptional rate and mRNA decay. Since we observed significant 
differences in mRNA half-life between CNS positive a duplicates 
and CNS negative a duplicates, we therefore hypothesized that any 
variance of gene expression across the microarray datasets could 
be partially regulated by CNSs through an mRNA decay mecha- 
nism. To determine if the observed changes in mRNA decay based 
on CNS annotation could be attributed to broad (many tissues or 
conditions) or narrow (few tissues or conditions) gene expression, 
we examined the sample variance of expression intensity for all 
genes across the 7,016 expression datasets. We selected the metric 
x to quantify the sample variance, as it is similar to the coefficient 
of variation (CV), but has been reported to be superior compared 
to CV for measuring breadth of gene expression (Liao and Zhang, 
2006). A t = 1 represents expression in only a single microarray 
experiment, while a t = 0 represents expression across all 7,016 
microarray experiments in our study. 
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FIGURE 2 | The distribution of mRNA half-lives across genes grouped by (A) intron, (B) 5' UTR, and (C) 3' UTR annotation 



All a duplicates were then dissected into two gene subsets based 
on CNS presence. Unlike rates of mRNA decay, there was no differ- 
ence in t for CNS negative a duplicates relative to all genes (median 



0.281 and 0.287, respectively; KS-p = 0.02; Figure 5A; Table 1). 
Similarly, CNS positive a duplicates also had no difference in 
x relative to all genes (median 0.296 and 0.287, respectively; 
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FIGURE 3 | The distribution of mRNA half-lives across a duplicates 
grouped by (A) CNS presence/absence and (B) subgene position 
restricted CNS annotation. Each gene subset is restricted to genes 
with annotated 5' UTR, intron, and 3' UTR sequence. The D-value 



represents the distance between the distributions and was used in the 
Kolmogorov-Smirnov (KS) statistic to determine statistical difference. 
CNS, conserved non-coding sequence; NX non-transcribed; X 
transcribed. 



KS-p = 3.11 x 1(T 3 ; Figure 5A; Table 1). Markedly, CNS posi- 
tive a duplicates had significantly higher x (narrower expression) 
than CNS negative a duplicates (median 0. 296 and 0.281, respec- 
tively; KS-p = 7.14 x 10~ 4 ; Table 1; Table SI in Supplementary 
Material). 

We then examined a duplicates separated into gene sub- 
sets based on CNS subgene position. There was a significant 
increase in x (narrower expression) for a duplicates with only 
non-transcribed CNSs relative to all genes (median 0.304 and 
0.287, respectively; KS-p = 5.82 x 10~ 5 ; Figure 5B; Table 1). The 
increase in x for a duplicates with only non-transcribed CNSs 
was also significant relative to CNS negative a duplicates (median 
0.304 and 0.281, respectively; KS-p = 3.74 x 10~ 4 ; Table 1; Table 
SI in Supplementary Material). There was no difference in x for a 
duplicates with only transcribed CNSs relative to all genes (median 



0.283 and 0.287, respectively; KS-p = 0.04; Table 1). Additionally, 
a duplicates with only transcribed CNSs had no change in x relative 
to CNS negative a duplicates (median 0.283 and 0.281, respec- 
tively; KS-p = 0.19; Figure 5B; Table 1; Table SI in Supplemen- 
tary Material). Interestingly, there was an increase in x (narrower 
expression) for a duplicates with both non-transcribed and tran- 
scribed CNSs relative to all genes (median 0.319 and 0.287, respec- 
tively; KS-p = 2.06 x 10~ 7 ; Figure 5B; Table 1). The increase in x 
for a duplicates with both non-transcribed and transcribed CNSs 
was also significant relative to CNS negative a duplicates (median 
0.319 and 0.281, respectively; KS-p = 1.85 x 10~ 7 ; Table 1; Table 
SI in Supplementary Material). All pairwise comparisons for x 
were also made using Wilcoxon ranked sum tests and resulted 
in similar patterns of significance (Table S2 in Supplementary 
Material). 
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Table 1 | Gene expression characteristics of Arabidopsis gene subsets. 



Gene subset 


Genes 


mRNA HL 


T 


cv 


All genes 


9958 


4.11 


0.287 


0.099 


a Duplicates 


2755 


4.21 


0.289 


0.102 


Singleton 


2092 


4.35 


0.277 


0.093 


Non-a duplicates 


5111 


3.96 


0.290 


0.101 


CNS negative a duplicates 


1130 


5.02 


0.281 


0.099 


CNS positive a duplicates 


1625 


3.57* 


0.296* 


0.103 


a Duplicates with only NT CNSs 


454 


3.51* 


0.304* 


0.110* 


a Duplicates with onlyT CNSs 


703 


4.24* 


0.283 


0.094* 


a Duplicates with T and NT CNSs 


468 


2.85* 


0.319* 


0.114* 



*p-Value < 0.001 via KS test compared to CNS negative a duplicates. 

All values shown for mRNA HL, x and CV are medians; X transcribed; CV, 

coefficient of variation; HL, mRNA half-life (hrs); NT non-transcribed. 
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FIGURE 4 | Selection of outlier genes with fast and slow rates of 
mRNA decay relative to the distribution of mRNA half-lives 



CNS' ANNOTATION AND GENE EXPRESSION CHARACTERISTICS 

The initial screen of CNS elements was limited to a duplicate pairs 
(Thomas et al., 2007). However, there is the possibility that CNS 
elements exist elsewhere in the genome near singletons, non-a 
duplicates or in non-duplicated form surrounding other a dupli- 
cates. We had identified additional CNS elements throughout the 
Arabidopsis genome and labeled these elements as CNS' (Spangler 
et al., 2012a). We tested for differences in mRNA half-life, x, and 
CV across a duplicates, singletons, and non-a duplicates with and 
without CNS' annotation. As per the CNS analysis, we found that 
CNS' positive a duplicates had significantly shorter mRNA half- 
lives than CNS' negative a duplicates (median 3.95 and 5.29 h, 
respectively; KS-p = 8.31 x 10~ 8 ; Table 2; Table S3 in Supplemen- 
tary Material). Similar to the CNS-only analysis, there was no 



significant difference between CNS' positive a duplicates and CNS' 
negative a duplicates for x (median 0.291 and 0.284, respectively; 
KS-p = 0.29; Table 2; Table S3 in Supplementary Material). Inter- 
estingly, there was no difference in mRNA half-life between CNS' 
positive singletons and CNS' negative singletons (median 4.21 and 
4.59 h, respectively; KS-p = 0.10; Table 2; Table S3 in Supplemen- 
tary Material). There was also no difference in mRNA half-life 
between CNS' positive non-a duplicates and CNS' negative non- 
a duplicates (median 3.95 and 3.98 h, respectively; KS-p = 0.94; 
Table 2; Table S3 in Supplementary Material). All pairwise com- 
parisons for CNS' gene subsets were also made using Wilcoxon 
ranked sum tests and resulted in similar patterns of significance 
(Table S4 in Supplementary Material). 

DISCUSSION 

While the ability of CNSs to influence steady-state mRNA levels at 
the transcriptional level has previously been examined, the poten- 
tial for post-transcriptional regulation by CNSs was limited to 
examining IME and predicted 5'-UTR folding energies (Spangler 
et al., 2012b). In this study, we associated the presence of CNSs 
with faster rates of mRNA decay and the absence of CNSs with 
slower rates of mRNA decay. We suggest these differences in rates 
of mRNA decay are partially responsible for changes in breadth 
of gene expression (x and CV). Broadly, this study and previ- 
ous results supports our working hypothesis that CNSs encode 
multiple regulatory mechanisms and influence steady-state mRNA 
levels at both transcriptional and post-transcriptional levels. 

Within this study we found the presence of CNSs was sufficient 
to significantly reduce mRNA half-life by ~0.5h relative to all 
genes and ~1.5h relative to CNS negative a duplicates (Table 1). 
This reduction in mRNA stability was further supported by the 
enrichment of CNS positive a duplicates within genes with the 
fastest rates of mRNA decay. The reduction in mRNA half-life 
appeared to be partially dependent on CNS subgene position, as 
a duplicates with only transcribed CNSs were the most similar to 
the genomic background and had the smallest difference in mRNA 
half-life relative to CNS negative a duplicates. Additionally, there 
was no correlation between CNS frequency and rate of mRNA 
decay for a duplicates with only transcribed CNSs (Spearman's 
rho = —0.08; p = 0.02) or a duplicates with only non-transcribed 
CNSs (Spearman's rho = —0.09; p = 0.05). This suggests that the 
presence of even a single non-transcribed CNS may be sufficient 
to reduce mRNA half-life. We attempted to narrow the effect of 
CNSs on mRNA half-life to individual subgene positions (e.g., 
5'-upstream, 5'-UTR), but were unable to detect any significant 
differences (data not shown). 

The association of non-transcribed CNSs (5'-upstream and 
3' -downstream) with an increased rate of mRNA decay is a sur- 
prising finding given that any RNA decay motifs encoded in 
the CNS would not be present in the preprocessed or mature 
RNA transcript. The mechanism by which non-transcribed CNSs 
are influencing the rate of mRNA decay is unknown, but non- 
transcribed CNSs are in phase with increased mRNA decay. It may 
be that a duplicates with non-transcribed CNS are associated with 
motifs that are not encoded within the CNS. For example, a num- 
ber of genes in Arabidopsis contain miRNA target motifs within 
their coding regions (Llave et al., 2002; Rhoades et al., 2002; Chen, 
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FIGURE 5 | The distribution of x across a duplicates grouped by 
(A) CNS presence/absence and (B) subgene position restricted 
CNS annotation. Each gene subset is restricted to genes with 
annotated 5' UTR, intron, and 3' UTR sequence. The D-value 



represents the distance between the distributions and was used in 
the Kolmogorov-Smirnov (KS) statistic to determine statistical 
difference. CNS, conserved non-coding sequence; NT, 
non-transcribed; T transcribed. 



Table 2 | Gene expression characteristics based on CNS' annotation. 



Gene subset 


Genes 


mRNA HL 


t 


CV 


CNS' negative a duplicates 


497 


5.29 


0.284 


0.101 


CNS' positive a duplicates 


2258 


3.95* 


0.291 


0.102 


CNS' negative singletons 


940 


4.59 


0.274 


0.092 


CNS' positive singletons 


1152 


4.21 


0.278 


0.094 


CNS' negative Non-a duplicates 


2202 


3.98 


0.287 


0.100 


CNS' positive Non-a duplicates 


2909 


3.95 


0.291 


0.103 



*p-Value < 0.001 via KS test compared to CNS' negative o duplicates. 
All values shown for mRNA HL, t and CV are medians. 
CV, coefficient of variation; HL, mRNA half-life Ihrs). 

2004), and some genes contain coding region motifs recognized 
by RNA binding proteins that reduce transcript stability (Chang 
et al., 2004; Lee and Gorospe, 2011). The potential for a duplicates 
to contain novel os-regulatory post-transcriptional motifs within 



their coding sequence is interesting and should be considered in 
future studies. It is possible that the CNS is coupled to a conserved 
coding (i.e., CDS) motif that would be bypassed by the way CNSs 
were discovered. 

a Duplicates with non-transcribed CNSs and a duplicates with 
both non-transcribed and transcribed CNSs demonstrate nar- 
rower expression (higher t) than CNS negative a duplicates, which 
suggests that non-transcribed CNSs may contain as-regulatory 
elements responsible for controlling breadth of gene expression. 
However, only a duplicates with both non-transcribed and tran- 
scribed CNSs had lower mRNA half-lives than CNS negative a 
duplicates, suggesting that the changes in breadth expression are 
only partially regulated at the level of mRNA decay. The differ- 
ences in breadth of expression between the gene subsets we tested 
were also maintained using CV as our metric of breadth of gene 
expression, although the statistical differences were less defined 
than t (Tables S1-S4 in Supplementary Material). The similar- 
ity between metrics was due, in part, to a correlation between CV 
and x (Spearman's rho = 0.556; p < 2.20 x 10 -16 ; Figure 6). These 



www.frontiersin.org 



May 2013 | Volume 4 | Article 129 | 7 



Spangler and Feltus 



CNSs and mRNA decay 



0.750 - 




0.0 0.1 0.2 0.3 0.4 

Breadth of Expression ( CV ) 

FIGURE 6 | Comparison of breadth of gene expression as measured by coefficient of variance (CV) and t across genome. 



results further support that x provides an improved level of reso- 
lution for measuring breadth of gene expression, and that CNSs 
assist in the control breadth of gene expression. 

Although a duplicates have higher expression level (AEI) rel- 
ative to other genes in Arabidopsis (Wang et al, 2011; Yang and 
Gaut, 20 1 1 ), we found a duplicates to only have a small increase in 
AEI relative to all genes within our dataset (median 7.79 and 7.73, 
respectively; KS-p = 2.82 x 10~ 4 ). The small differences in AEI 
were also reflected in mRNA half-life as we found no significant 
differences in mRNA half-life between a duplicates, singletons and 
non-ot duplicates relative to all genes (Figure 7A). Intriguingly, we 
did observe a difference in AEI between CNS positive a duplicates 
and CNS negative a duplicates (median 7.68 and 7.94, respectively; 
KS-p = 5.09 x 10~ 4 ), further supporting a link between AEI and 
mRNA half-life. While there was no effect of gene duplication 
status on mRNA half-life, we did observe a significant decrease 
in x for singletons relative to all genes (Figure 7B). This had been 
previously observed in Arabidopsis (Yang and Gaut, 20 1 1 ) and sup- 
ports the hypothesis that mRNA stability only partially controls 
the breadth of gene expression. 

Expanding our analysis to CNS elements outside of a duplicate 
gene pairs (CNS'), it was found that there was still a significant 



difference in mRNA decay between CNS' positive a duplicates 
relative to CNS' negative a duplicates (Table 2). However, CNS' 
presence did not have any detectable influence on mRNA half- 
life for singletons or non-a duplicates. We propose the following 
hypotheses regarding these observations: (i) the DNA sequence 
in CNS' elements has diverged sufficiently or lost appropriate 
positional proximity that post-transcriptional regulation was lost; 
(ii) CNS elements must be maintained in duplicate form for 
post-transcriptional regulation to function correctly; (iii) CNS' 
elements are false positive as-regulatory motifs. There is evidence 
to dispute the third hypothesis, as CNS' elements have been found 
to overlap with known gene regulatory networks (Spangler et al., 
20 12a). Further research on CNS' elements would help to test these 
hypotheses. 

Rates of mRNA decay have been correlated with several func- 
tional classes of genes, such as kinases, plasma membrane proteins 
and transcription factors (Wang et al, 2002; Yang et al., 2003; 
Narsai et al., 2007). Notably, a duplicates are enriched in some 
of these functional classes [e.g., transcription factors; (Thomas 
et al, 2007)]. In addition, rates of mRNA decay are known to vary 
based on various environmental stimuli, such as chemical expo- 
sure, oxidative stress, or DNA damage (Shalem et al, 2008; Elkon 
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FIGURE 7 | The distribution of (A) mRNA half-lives and (B) x across genes grouped by duplication status. 



et al., 2010), which would depend on regulatory signals such as 
transcription factors. However, upon examination of each CNS 
gene subset there was no significant enrichment of functional 
terms (e.g., GO, KEGG) beyond annotation previously associ- 
ated with a duplicates [e.g., transcription factors, kinases; Table 
S5 in Supplementary Material; (Blanc and Wolfe, 2004; Seoighe 
and Gehring, 2004; Thomas et al, 2007)]. Therefore this suggests 
that the differences in mRNA stability associated with CNS pres- 
ence or absence cannot be attributed to an obvious functional 
class. 

Our working hypothesis is that CNSs are as-regulatory DNA 
elements that influence mRNA steady-state levels, and the reg- 
ulatory mechanisms encoded in the CNSs are a combination of 
transcriptional and post-transcriptional control. The prevailing 
hypothesis for the fractionation bias observed after most WGD 
events is that genes more sensitive to variation in dosage, possi- 
bly conferred by CNS encoded regulation, have a higher impact 
on fitness and are more likely to be retained in duplicated gene 
pairs (Birchler and Veitia, 2007; Schnable et al., 2012). In this 
case, the organism's ability to tightly regulate gene dosage via 
an mRNA decay mechanism after a WGD event would pro- 
vide a selective advantage. More specifically, within this study we 



provide evidence that post-transcriptional control of a duplicate 
pairs could be mediated through CNSs via mRNA decay mech- 
anisms. We have included the list of genes with CNS sequence 
and mRNA decay rate for further testing of this hypothesis 
at the individual gene level (Table S6 in Supplementary Mate- 
rial). Although CNSs are only one component of the com- 
plete regulation story, genes with CNSs are more likely to be 
maintained across multiple WGD events (Schnable et al., 2011, 
2012), and it may be that the regulatory flexibility conferred 
by CNSs to regulate gene dosage has played an integral role 
to the retention of many a duplicates following the a WGD 
event. 

MATERIALS AND METHODS 
IDENTIFICATION OF GENE DUPLICATION STATUS 

The list of a duplicates gene pairs were collected from (Thomas 
et al., 2006) and were updated to TAIR10 annotation, reducing 
the list of 3,166 gene pairs to 3,118. Genes with only self BLASTP 
hits (E < 10~ 10 ) in the TAIR10 genome were considered single- 
tons. There were 5,108 genes that met this criterion in the TAIR10 
genome. Any gene that was not an a duplicate or singleton was 
assigned to the category of non-ot duplicates. 
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MICROARRAY COLLECTION AND GENOME ANNOTATION 

A total of 7,158 Arabidopsis ATH1 Genome Array experiments 
were obtained from NCBI GEO (platform GPL 198). RMA nor- 
malization (Irizarry et al., 2003) was performed for all samples 
together using the command-line utility of RMAExpress 1 . Sam- 
ple outlier detection was performed using the arrayQualityMetrics 
(Kauffmann et al., 2009) tool for Bioconductor (Gentleman et al., 
2004). Samples that failed two of the three outlier tests were 
removed from the dataset. The remaining dataset consisted of 
7,016 microarray experiments. All probe sets were then mapped to 
genes using ATH1 mappings available via TAIR (Swarbreck et al., 
2008) 2 . Of the original 22,810 probe sets on the ATH1 platform, 
all Affymetrix control probe sets (prefixed with AFFX), probe sets 
that did not map to a gene model in TAIR10 (non-genic), or probe 
sets that mapped to multiple loci (ambiguous) were removed. The 
final count of probe sets used was 21,107. Any values calculated 
for probe sets that were shared by a single gene (redundant) were 
averaged. The list of CEL files used can be found in Table S7 in 
Supplementary Material. 

mRNA STABILITY ESTIMATES 

Observed mRNA half-lives were collected from the supplemen- 
tary information of (Narsai et al., 2007) and included data for 
13,012 probe sets. The probe sets were reduced to exclude non- 
genic and ambiguous probe sets. The final count of probe sets 
analyzed was 12,327. Half-lives for probe sets that were shared by 
a single gene (redundant) were averaged and resulted in 12,189 
genes. The distributions of mRNA half-life were compared using 
the Kolmogorov-Smirnov test (KS test) and Wilcoxon ranked sum 
test (Wilcox test) in R. The associated p-values can be found in 
Tables S1-S4 in Supplementary Material. 

BREADTH OF GENE EXPRESSION 

The breadth of gene expression was measured with the index x 

(Yanai et al, 2005; Yang and Gaut, 201 1): 

y-n [. _ log 2 S(i,j) 1 
^;=1 I log 2 S(i,max)J 

x = - 

n- 1 

S(i, max) represents the maximum expression intensity for the 
given probe set across all microarray experiments. Genes with 
at = 0 represent expression across all microarrays, while genes 
expressed in only one microarray will approach t=1. Breadth 
of gene expression was also measured using the coefficient of 
variation (CV = o/\i) for each probe set. 

FUNCTIONAL ENRICHMENT WITHIN CNS SUBGENE POSITION 
EXCLUSIVE a DUPLICATES 

a Duplicates were separated into CNS positive a duplicates, CNS 
negative a duplicates, a duplicates with only non-transcribed 
CNSs, a duplicates with only transcribed CNSs, and a dupli- 
cates with both non-transcribed and transcribed CNSs. These 



1 http://rmaexpress.bmbolstad.com/ 

2 affy_ATHl_array_elements-2010-12-20.txt; ftp://ftp.arabidopsis.org/home/tair/ 
Microarrays/Affymetrix/ 



gene lists were then tested for enrichment of functional terms 
using a DAVID-like (Huang et al., 2007) functional profiling strat- 
egy using in-house Perl scripts (Huang et al., 2008; Ficklin et al., 
2010). All terms were tested for enrichment across each gene list 
via a Fisher's exact test using a Perl script. Any terms with a Bonfer- 
roni p < 0.00 1 were considered significantly enriched. All GO 3 and 
Interpro 4 annotations were downloaded from TAIR. All TAIR 10 
peptide sequences (TAIR10_pep_20101214.txt) were downloaded 
from 5 and submitted to the KEGG Automatic Annotation server 
on 10-26-2011 (Moriya et al, 2007). All Pfam domains were 
obtained from the Sanger database 6 . Enrichment of functional 
terms including gene ontology (GO), protein domains (Interpro 
and Pfam) and biochemical pathways (KEGG) can be found in 
Table S5 in Supplementary Material. 

CNS ANNOTATION 

All CNS annotation was collected from the supplemental data of 

(Spangler et al, 2012b). 

CNS' ANNOTATION 

All CNS' annotation was collected from the supplemental data 
of (Spangler et al, 2012a). The associated p-values from all 
Kolmogorov-Smirnov test and Wilcoxon ranked sum tests with 
CNS' can be found in Tables S3 and S4 in Supplementary Material. 

TAIR10 UTR ANNOTATION 

All TAIR10 5' -UTR, intron, and 3'-UTR sequences were down- 
loaded from TAIR (TAIR10_5_utr_20101028, TAIR10_intron_ 
20101028 and TAIR10_3_utr_20101028). 
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3 ATH_GO_GOSLIM.txt; ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_ 
Ontology; 10-25-2011 

4 TAIR10_all. domains; ftp://ftp.arabidopsis.org/home/tair/home/tair/Proteins/ 
Domains/; 11-18-2010 

5 ftp://ftp.arabidopsis.org/home/tair/Proteins/TAIR10_protein_lists 

6 ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database_files/pfamA. 
txt.gz 
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